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DUPLICATE 



Molecular Analysis 

The present invention relates to a method of determining the sequence or occurrence 
frequency of a number of variable gene inserts from a random gene library, which 
5 inserts exhibit a desired specific characteristic, wherein each variable gene insert is 
flanked 5' and 3' by known sequences, the method comprising; selecting the number 
of inserts by their ability to exhibit the desired specific characteristic, conducting 
polymerase chain reaction to amplify the selected number of variable gene inserts to 
produce components of a mixed PGR product, ligating the components of the mixed 
10 PGR product to produce a concatenated sequence and sequencing or determining the 
occurrence of the gene inserts in the concatenated sequence. 

Background to the invention 

15 Peptide phage display is a prototypical version of directed in vitro molecular 

evolution of a large combinatorial library by sequential rounds of physical selection 
and enrichment. Peptide phage display selection methods have established 
themselves as powerful tools for the identification of short linear peptide mimetics of 
many ligand classes. Variant techniques have also been developed extending this 

20 methodology to selection from large libraries of oligonucleotides (either random or 
constrained), translation-arrested ribosomes, phage-displayed binding proteins (of 
which single chain fragments of inaimunoglobulin is the largest group) and so on. 

A major lin^tation of all these methods is that the complexity and composition of the 
25 selection-evolved sublibrary is assessed by analysis of a very small sample drawn at 
random from this sublibrary. In consequence, enrichment is deemed to have been 
achieved when a very few, or more usually one, sequence dominates the sample. This 
may be acceptable when the evolution is directed by a simple target with one or very 
few binding sites, but severely limits the method if the target is complex. 

30 

The outcome of repeated rounds of selection from a large random peptide phage 
display library (typically beginning with >10^ different phage) is a reduced . 
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complexity sub-library enriched for sequences showing specific affinity for the 
selection matrix. Typically such a sub-library may contain lO^-lO"^ different phage, 
but in all published stxidies the outcome of phage panning is assessed by sequencing 
<20 independent clones. 

5 

The problems inherent in directed evolution with a complex target have already been 
recognised in the field; a theoretical analysis has been presented (Vant Hull et al 
(1998) J Mol Biol 278 597-597) and a partial practical solution suggested (Messmer 
et al (2000) J Mol Biol 296 821-832). This "iterative panning" solution relies on 

10 completing multiple rounds of directed evolution until a single 'winning' sequence 
emerges. This sequence is then prepared as a synthetic peptide and used in the 
blocking solution during a repeat of the whole experiment (ie a further set of 
selections on new target material). Binding of the first class of peptides is now 
blocked and a second 'winning' sequence is selected. This process can be continued 

15 indefinitely, each entire round generating one new binding sequence. The method is 
very slow, very expensive and probably impossible for many key complex biological 
target materials, since it relies on a large supply of functionally homogenous target 
material (tumour tissue, infected cells etc) for many rounds of selection. 

20 The present invention addresses the problems identified in the prior art. 

The Present Invention 

The present invention provides a method of determining the sequence and/or 
25 occurrence frequency of a number of variable gene inserts from a random gene 

library, which inserts exhibit a desired specific characteristic wherein each variable 
gene insert is flanked 5' and 3' by known sequences, the method comprising; 

selecting the number of inserts by their ability to exhibit the desired specific 
30 characteristic, conducting polymerase chain reaction to amplify the variable gene 
inserts to produce components of a mixed PGR product; 
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ligating the components of the mixed PGR product to produce a concatenated 
sequence; and 

sequencing or determining the occurrence of the gene inserts in the concatenated 
5 sequence. 

In accordance with the invention, the method can be used for determining the 
sequence of a number of variable gene inserts or for determining the occurrence 
frequency of a number of gene inserts from a gene library. Preferably the gene library 
10 is a peptide phage display library. 

In the present invention, the polymerase chain reaction is conducted using primers 
complementary to the known sequences 5' and 3' to the variable inserts. Further, 
between the step of ligating the components of the mixed PGR product to produce a 
15 concatenated sequence and sequencing or determining the occurrence frequency of 
the gene inserts, it is preferable to subclone the size-selected concatenated products 
into a convenient vector for production of plasmid DNA suitable for automated 
sequencing. 

20 The methodology may include the serial analysis of gene expression (SAGE) as 
described in WO 97/10363 or WO 02/010438 which are hereby incorporated by 
reference in their entireties. 

The present invention relates to a random gene library (of any size, including a large 
25 library). The number of variable gene units is selected (or evolved) by identifying 

those gene inserts which exhibit a desired specific characteristic. This selection may 
involve one or more cycles of physical selection and amplification to generate a 
sub-library whose gene inserts encode sequences that share some desired specific 
characteristic, such as a physical or biological activity. Altematively, the selection 
30 may be of gene inserts which do not exhibit a specific characteristic e.g. do not bind 
to a particular protein. 
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In order to determine the number of variable gene according to the present invention, 
one or more rounds of selection are carried out. These rounds select gene inserts 
which exhibit a desired specific characteristic. These rounds of selection may reduce 
the different number of gene inserts from around 10^^ to 10^ by one round; to 10"^ by 
5 the next round. Each round may be selecting for the same or for a different desired . 
specific characteristic. During any round, it is possible to introduce additional 
selection criteria or to 'iDlock" any binding by the presence of, for example, one or 
more gene, amino acid or protein sequences which may be present in the random 
library. 

10 

The invention provides a simple and economical v^ay of sequencing the relevant 
variable parts of the gene encoding the phage code protein from a large number 
(preferably all) of the phage in the selected sub4ibrary. This is in contrast to the prior 
art which only determined the sequence of a tiny and potentially imrepresentative 

15 sub-set of the hbrary. Furthermore, the present invention is a large-scale unbiased 
analysis without plaque selection or phage DNA purification from selected plaques. 
The method is achieved by using a polymerase chain reaction with unique primers 
lying just 5* and just 3' to the variable insert in the gene encoding phage coat protein. 
The reaction is carried out on pooled phage DNA isolated from an aliquot of the 

20 library without plaque purification and therefore contains proportional representation 
amplification of all variable regions in the library or sub-library. 

The benefit of the present invention is that each variable region will have an 
abundance in the double strand DNA product that is proportional to the abundance of 

25 that insert sequence amongst the phage in the selected sub-Ubrary. The sub-library is 
the selected number of variable gene inserts on which PGR is carried out according to 
the present invention. When the components of the mixed PGR product have been 
prepared, they are ligated to produce a concatenated sequence. It may be useful or 
preferable to digest the components of the mixed PGR product with a very 

50 infrequentiy cutting restriction endonuclease before ligation to produce concatenated 
sequences. Following production of the concatenated sequences, they may preferably 
be size selected to around 1.5kb before being cloned into a convenient plasmid. 
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Subsequent sequencing of such inserts generates greater than 30 variable insert 
sequences per sequencing lane. 

The length of each variable gene insert is preferably from 18 to 24 nucleotides. 

5 

Known software that is used to automatically strip the joining sequences out of 
continuous DNA sequence to identify and then tabulate the di-tags during serial 
analysis of ligand-selected peptide display sub-libraries can be used to generate 
abundance histograms for all the insert sequences identified. 

10 

The present invention allows the easy analysis of many (often all) of the variable 
inserts present in the gene library population. This pemaits the evolution of the 
selection process to be followed much more accurately. It also ensures that consensus 
matrix-binding sequences are identified both earlier and more accurately. Also 
15 important is that the method of the present invention overcomes the problems of 

clonal dominance due to the emergence of a single family of binding sequences which 
prevents analysis of interactions on complex matrices. 

The present invention allows the rapid and complete identification of all linear or 
20 cysteine cyclised peptides that exhibit a specific behaviour permitting gene selection 
(either positive or negative). It is also applicable to the classification of all antibody 
epitopes in a complex humoral response to a pathogen. 

25 The specific behaviour/characteristic permitting gene selection (also described herein 
as a specific characteristic permitting gene selection) may be any, including the fact 
that the gene encodes a particular protein which binds to another protein in question. 
Altematively, the gene may encode a protein sequence which only occurs in one state 
of tissue in comparison with the same tissue in a different state. For example, normal 

30 versus tumour tissue, infected versus non-infected tissue, wild type versus mutant, 

healthy versus oxidatively damaged, healthy versus ischaenodc, or occurrence during a 
particular time zone which is absent at an altemative time zone. 
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The essence of the proposed invention is a method based on concatenation of short 
PGR products for efficient sequencing; this permits the analysis of hundreds if not 
thousands of sequences corresponding to peptides selected at each round of a target- 
5 directed evolution from a large combinatorial library. In its simplest form this method 
reveals multiple sequence families as they are enriched by selection; a single series of 
enrichment experiments generates frequency histograms for all emergent classes of 
selected sequence. In this way, multiple binding sites on a complex target are 
identified without using the time-consuming and expensive iterative panning 
10 approach. 

The present invention can be utilised in various ways. For example, differential 
panning on two states (normal versus tumour tissue; infected versus non-infected; 
wild-type versus mutant; healthy versus oxidatively damaged; healthy versus 

15 ischaemic, etc) together with frequency histogram generation on large insert numbers 
at each round of panning offers a new type of information. The frequency histograms 
of the two independent panning experiments are compared (in a manner analogous to 
comparing two SAGE tag profiles, or the microarray binding data from the mRNA 
samples). This identifies peptide binders that are state independent (ie lying along the 

20 diagonal on the two state plot) as well as binders that are enriched in one state or the 
other. This already enriches the data obtained very substantially. However, by 
adding a third dimension that identifies the time of appearance of a given sequence 
during the multiple rounds of panning an entirely new type of information emerges. It 
is now possible to perform cluster analysis on points that lie off the diagonal within 

25 this 3-D space to identify groups of weak signals that together offer a discriminant 
measure between the two states. To take a specific example, this approach could be 
used to search for small groups of peptide mimetics that can together discriminate 
between normal and tumour tissue in a way that could not be achieved by analysis of 
binding of any single peptide. 



30 
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The present invention also relates to a detennination of the sequence or occurrence 
frequency of a number of variable gene inserts from a gene library obtained by a 
method according to the method of the present invention. 
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Claims 

1. A method of determining the sequence or occurrence frequency of a number of 
variable gene inserts from a random gene library which inserts exhibit a desired 
specific characteristic, wherein each variable gene insert is flanked 5' and 3' adjacent 
known sequences, the method comprising; 

selecting the number of inserts by their ability to exhibit the desired specific 
characteristic, conducting polymerase chain reaction to amplify the selected nimaber 
of variable gene inserts to produce components of a ndxed PGR product; 

ligating the components of the noixed PGR product to produce a concatenated 
sequence; and 

sequencing or determining the occurrence of the gene inserts in the concatenated 
sequence. 

2. A method as claimed in claim 1 wherein the gene library is a peptide phage 
display library. 

3. A method as claimed in claim 1 or claim 2, wherein the components of the 
mixed PGR product are digested with a restriction endonuclease before ligation to 
produce the concatenated sequence. 

4. A method as claimed in any one of claims 1 to 3, wherein the concatenated 
sequence is cloned into a plasmid before sequencing, 

5. A method as claimed in claim 4, wherein the concatenated sequence is size 
selected to around 1.5 kilobases in length before cloning into the plasmid. 

6. A method as claimed in any one of claims 2 to 5, wherein the length of each 
variable gene inserts is from 18 to 24 nucleotides. 
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7. A method as claimed in claim 6, wherein the specific characteristic is one or 
more of protein binding or occurrence only in one state of tissue in a comparison with 
the same tissue in a different state. 

8. A method as claimed in any one of claims 2 to 5, wherein the number of 
variable gene inserts are from phage which do not exhibit a specific characteristic. 

9. A method as claimed in any one of claims 1 to 9, wherein selecting the number 
of inserts by their ability to exhibit the desired specific characteristic comprises two or 
more rounds of selection based on the ability of the variable gene inserts to exhibit the 
desired characteristic. 

10. A method as claimed in any one of claims 1 to 9, wherein when more than one 
round of selection by the abihty of the insert to exhibit the desired characteristic is 
carried out, then the different rounds of selection may be of more than one desired 
specific characteristic. 

11. A determination of the sequence or occurrence firequency of a number of 
variable gene inserts from a gene library, obtained by a method according to any one 
of claims 1 to 10. 



